Statistical Significance of Tree Similarity Scores
نویسندگان
چکیده
New methodologies for performing pairwise tree-matching on carbohydrate sugar chain data were introduced in [3], in which well-known sequence alignment algorithms [12] were extended and an already known polynomial-time graph algorithm for finding the maximum common subtree (MCST) of two trees [7] was used to implement what is called the KEGG Carbohydrate Matcher, or KCaM. These new methodologies are currently available on the web via KEGG Glycan [9, 15]. We make note of some appealing work related to not only KCaM but biological tree-structure matching in general.
منابع مشابه
Empirical statistical estimates for sequence similarity searches.
The FASTA package of sequence comparison programs has been modified to provide accurate statistical estimates for local sequence similarity scores with gaps. These estimates are derived using the extreme value distribution from the mean and variance of the local similarity scores of unrelated sequences after the scores have been corrected for the expected effect of library sequence length. This...
متن کاملOn the statistical significance of nucleic acid similarities
When evaluating sequence similarities among nucleic acids by the usual methods, statistical significance is often found when the biological significance of the similarity is dubious. We demonstrate that the known statistical properties of nucleic acid sequences strongly affect the statistical distribution of similarity values when calculated by standard procedures. We propose a series of models...
متن کاملThe statistical distribution of nucleic acid similarities.
All pairs of a large set of known vertebrate DNA sequences were searched by computer for most similar segments. Analysis of this data shows that the computed similarity scores are distributed proportionally to the logarithm of the product of the lengths of the sequences involved. This distribution is closely related to recent results of Erdos and others on the longest run of heads in coin tossi...
متن کاملWhen is Chemical Similarity Significant? The Statistical Distribution of Chemical Similarity Scores and Its Extreme Values
As repositories of chemical molecules continue to expand and become more open, it becomes increasingly important to develop tools to search them efficiently and assess the statistical significance of chemical similarity scores. Here, we develop a general framework for understanding, modeling, predicting, and approximating the distribution of chemical similarity scores and its extreme values in ...
متن کامل8 th Annual Institute for Genomics & Bioinformatics ( IGB ) Biomedical Informatics Training ( BIT ) Program Symposium
As repositories of chemical molecules continue to expand and become more open, it becomes increasingly important to develop tools to search them efficiently and assess the statistical significance of chemical similarity scores. Here we develop a general framework for understanding, modeling, predicting, and approximating the distribution of chemical similarity scores and its extreme values in l...
متن کامل